The skillIQ and roleIQ tests are addictive. I haven’t used Pluralsight to learn and improve my technical skills yet, but I can see how the assessments would drive interaction and frequent improvement of subscribers. What a fun way to encourage personal and professional development!
We can evaluate the algorithm’s determination to stop asking questions using a time-series of each assessment. The obvious guess is a minimal threshold for question-to-question changes in the RD value. Something very similar to that guess is confirmed by observing a random sample of several user_assessment_session_ids.
It’s probably worth checking the other metrics associated with a session (display_score, percentile, and ranking) to confirm our suspicions regarding rd as the main variable driving the algorithm. Per the plots below of the same three assessment sessions we see that rd is the only metric of the four that seems an appropriate option.
A closer look at the distribution of the minimum rd values of each assessment’s interaction shows that a simple threshold of 80 drives the stopping rule. Over 75% of the sessions were stopped at a rd value below and very near 80. While that seems like an arbitrary value to me, I am sure there was some empirical and theoretical studies performed to determine that threshold. Also 75% may seem low, but that includes all sessions, even those that were stopped prematurely by the user (as discussed in #3).
## 0% 5% 10% 15% 20% 25% 30%
## 77.99380 78.34009 78.42960 78.51234 78.59056 78.65580 78.70929
## 35% 40% 45% 50% 55% 60% 65%
## 78.76540 78.82022 78.87147 78.93500 79.00305 79.08484 79.19052
## 70% 75% 80% 85% 90% 95% 100%
## 79.34391 79.65148 94.55070 124.50190 156.89920 202.17270 256.61200
## # A tibble: 2 x 2
## rd_threshold `n()`
## <dbl> <int>
## 1 0 1608
## 2 1 5070
## # A tibble: 32 x 3
## # Groups: rd_threshold [?]
## rd_threshold n_questions_answered `n()`
## <dbl> <int> <int>
## 1 0 0 271
## 2 0 1 211
## 3 0 2 189
## 4 0 3 165
## 5 0 4 112
## 6 0 5 127
## 7 0 6 101
## 8 0 7 100
## 9 0 8 70
## 10 0 9 54
## # ... with 22 more rows
There are 724 questions in the dataset. I expect the rd metric to again indicate the certainty floor. A quick look at the distribution of rd values shows that floor to be 30. However, many (%) of the assessment_item_ids show all of their rd values to equal 30. Maybe that’s because those are older questions that reached the floor (30) previous to this dataset.
We would really like to look at all 724 of these questions. We could examine much of the structure using trelliscopejs, a tool for interactively viewing a large collection of visualizations. The key opportunity when using trelliscope is that it allows for creation of a rich feature set that is used to sort and filter through the data helping us see nuances, outliers, and important features of that data.
A brief description of the cognostics (features) is available by clicking on the “i” in the upper left corner. You can search for interesting assessment_item_ids by using the Sort and Filter buttons on the left hand side. To see those assessment_item_ids that have values of rd other than 30, click on the Filter button, then on the “All RD values = 30” pill, then enter “0” into the right side. This will reduce the total number of panels from 724 to 209. To see panels (plots) where at least two points are present (and thus a plot is created), remain clicked into the Filter button, then click on the “Number of Question Interactions” pill, then enter 2 on the left hand size of the range selection. This immediately removes all the blank panels (not plotted because only one observation exists) and reduces the number of panels from 209 to 180. Clicking on the Filter button again closes that window. You can sort or filter further to test hypotheses or explore the data sliced by assessment_item_id. Happy exploring!